Daniel Eaton Beta Grid Tutorial 

This tutorial is for Matlab users, and aims to parallelize you painlessly. The Beta Grid Cluster has 21 machines, with 1-2 processors, and 1-4GB RAM each (41 total Intel 3.06GHz Xeon CPUs). They run SuSE Linux. Matlab 7.0.1 (R14) is installed. Kevin Liang is the adminstrator.

Important NB: All processing is managed with the Sun N1 Grid Engine (SGE) software. You submit jobs to this software, which then delegates nodes in the cluster to process them. You do not execute code on the individual nodes (hosts) yourself (in fairness to other users).

Before you use the cluster:
  1. Get yourself access. Talk to your supervisor, to one of the Beta people, to Kevin Liang. This was handled for me (Kevin LB authorized me, Dave Brent added me to the betaguests netgroup).
  2. Get network storage. Especially if your code is data-intensive, make sure you have access to a big network scratch disk, which is globally accessible within CS. In my case, I use /cs/SCRATCH.

Using the cluster:
  1. Start on a Linux command-line. The SGE (cluster management) software is presently only compiled for Linux, so you can only submit jobs from a Linux machine (read: not from Windows, nor from the department SunOS servers). Personally, I SSH to one of the cluster hosts (list, ex. aluminum.icics.ubc.ca) and do all my job submission from there. Hereon I assume you are on a Linux command-line, logged into your CS account.
  2. Setup. Assuming you're using TCSH (otherwise, see Preparation), punch in:
    source /cs/beta/lib/pkg/sge/beta_grid/common/settings.csh which sets some environment variables. Now, type qconf -ss, which should echo the cluster host-list if you have been granted grid access, and are properly set up.
  3. Submit a job. Jobs are just shell scripts (C shell-csh, etc) that effect your processing task. Here is a sample. Your script should change the directory to wherever your code lives, invoke Matlab, and exit. To submit the job, just invoke: qsub job.sh. It will assign your job a number, delegate it to a cluster node, and then return. Workflow:
    1. Write a shell script that cd's to your working directory, and invokes Matlab in non-interactive mode (see remarks below).
    2. qsub myscript.sh
  4. Monitor your job. Shortly after your job is submitted, two files are created in your home directory. Assuming your script filename is batch.sh, and the job number is 12345, then your job's standard output will be redirected to ~/batch.sh.o12345, while its standard error is written to ~/batch.sh.e12345. You can use the SGE program qstat -u user_name to view the status of your job -- "r" means its running. You can also delete a single job, viz. qdel 12345, or all your jobs qdel -u user_name.

Misc. remarks:
  • csh != sh. C Shell syntax is substantially different than Bourne Shell syntax. (Aug-18-2005: Apparently you can use any scripting language installed on the cluster.)
  • Matlab needs to be run in non-interactive mode, and the splash screen + desktop (GUI) should also be disabled. This is accomplished by: matlab -r "MATLAB_CODE;" -nosplash -nodesktop -nojvm. Obviously, your code shouldn't try pumping out figures for visible display in this mode. MATLAB_CODE could be run program;, for example, where program.m is in the working directory you specified or the Matlab path.
  • With qsub you can append arguments after the script filename -- they will be passed to your script. For example, qsub myscript.sh 32 1964 (see my sample, where I accept 2 arguments).
  • To see what's happening, use the utility tail on the standard input/error files to print their last few lines (assuming your script/Matlab code does any output).
  • If you use Windows like me, you'll have to re-Mex all your external c/Fortran libraries.
  • You can specify where the standard input/output files are written (it defaults to your home directory), but I haven't played with this feature yet.

Happy clustering. Return to my main page.